instrument recognition
Persian Musical Instruments Classification Using Polyphonic Data Augmentation
Esfangereh, Diba Hadi, Sameti, Mohammad Hossein, Moridani, Sepehr Harfi, Javidpour, Leili, Baghshah, Mahdieh Soleymani
Musical instrument classification is essential for music information retrieval (MIR) and generative music systems. However, research on non-Western traditions, particularly Persian music, remains limited. We address this gap by introducing a new dataset of isolated recordings covering seven traditional Persian instruments, two common but originally non-Persian instruments (i.e., violin, piano), and vocals. We propose a culturally informed data augmentation strategy that generates realistic polyphonic mixtures from monophonic samples. Using the MERT model (Music undERstanding with large-scale self-supervised Training) with a classification head, we evaluate our approach with out-of-distribution data which was obtained by manually labeling segments of traditional songs. On real-world polyphonic Persian music, the proposed method yielded the best ROC-AUC (0.795), highlighting complementary benefits of tonal and temporal coherence. These results demonstrate the effectiveness of culturally grounded augmentation for robust Persian instrument recognition and provide a foundation for culturally inclusive MIR and diverse music generation systems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Asia > Middle East > Iran > Golestan Province > Gorgan (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
A Hierarchical Deep Learning Approach for Minority Instrument Detection
Sechet, Dylan, Bugiotti, Francesca, Kowalski, Matthieu, d'Hérouville, Edouard, Langiewicz, Filip
Identifying instrument activities within audio excerpts is vital in music information retrieval, with significant implications for music cataloging and discovery. Prior deep learning endeavors in musical instrument recognition have predominantly emphasized instrument classes with ample data availability. Recent studies have demonstrated the applicability of hierarchical classification in detecting instrument activities in orchestral music, even with limited fine-grained annotations at the instrument level. Based on the Hornbostel-Sachs classification, such a hierarchical classification system is evaluated using the MedleyDB dataset, renowned for its diversity and richness concerning various instruments and music genres. This work presents various strategies to integrate hierarchical structures into models and tests a new class of models for hierarchical music prediction. This study showcases more reliable coarse-level instrument detection by bridging the gap between detailed instrument identification and group-level recognition, paving the way for further advancements in this domain.
- Europe > United Kingdom > England > Surrey > Guildford (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (4 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
MIRFLEX: Music Information Retrieval Feature Library for Extraction
Chopra, Anuradha, Roy, Abhinaba, Herremans, Dorien
This paper introduces an extendable modular system that compiles a range of music feature extraction models to aid music information retrieval research. The features include musical elements like key, downbeats, and genre, as well as audio characteristics like instrument recognition, vocals/instrumental classification, and vocals gender detection. The integrated models are state-of-the-art or latest open-source. The features can be extracted as latent or post-processed labels, enabling integration into music applications such as generative music, recommendation, and playlist generation. The modular design allows easy integration of newly developed systems, making it a good benchmarking and comparison tool. This versatile toolkit supports the research community in developing innovative solutions by providing concrete musical features.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Singapore (0.05)
- Europe > Spain > Andalusia > Málaga Province > Málaga (0.05)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition
Vasilakis, Yannis, Bittner, Rachel, Pauwels, Johan
Music two-tower multimodal systems integrate audio and text modalities into a joint audio-text space, enabling direct comparison between songs and their corresponding labels. These systems enable new approaches for classification and retrieval, leveraging both modalities. Despite the promising results they have shown for zero-shot classification and retrieval tasks, closer inspection of the embeddings is needed. This paper evaluates the inherent zero-shot properties of joint audio-text spaces for the case-study of instrument recognition. We present an evaluation and analysis of two-tower systems for zero-shot instrument recognition and a detailed analysis of the properties of the pre-joint and joint embeddings spaces. Our findings suggest that audio encoders alone demonstrate good quality, while challenges remain within the text encoder or joint space projection. Specifically, two-tower systems exhibit sensitivity towards specific words, favoring generic prompts over musically informed ones. Despite the large size of textual encoders, they do not yet leverage additional textual context or infer instruments accurately from their descriptions. Lastly, a novel approach for quantifying the semantic meaningfulness of the textual space leveraging an instrument ontology is proposed. This method reveals deficiencies in the systems' understanding of instruments and provides evidence of the need for fine-tuning text encoders on musical data.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- Leisure & Entertainment (1.00)
- Media > Music (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data
Koo, Junghyun, Chae, Yunkee, Jeon, Chang-Bin, Lee, Kyogu
Music source separation (MSS) faces challenges due to the limited availability of correctly-labeled individual instrument tracks. With the push to acquire larger datasets to improve MSS performance, the inevitability of encountering mislabeled individual instrument tracks becomes a significant challenge to address. This paper introduces an automated technique for refining the labels in a partially mislabeled dataset. Our proposed self-refining technique, employed with a noisy-labeled dataset, results in only a 1% accuracy degradation in multi-label instrument recognition compared to a classifier trained on a clean-labeled dataset. The study demonstrates the importance of refining noisy-labeled data in MSS model training and shows that utilizing the refined dataset leads to comparable results derived from a clean-labeled dataset. Notably, upon only access to a noisy dataset, MSS models trained on a self-refined dataset even outperform those trained on a dataset refined with a classifier trained on clean labels.
- Media > Music (0.70)
- Leisure & Entertainment (0.70)
- Education (0.68)
Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio
Kim, Kyungsu, Park, Minju, Joung, Haesun, Chae, Yunkee, Hong, Yeongbeom, Go, Seonghyeon, Lee, Kyogu
As digital music production has become mainstream, the selection of appropriate virtual instruments plays a crucial role in determining the quality of music. To search the musical instrument samples or virtual instruments that make one's desired sound, music producers use their ears to listen and compare each instrument sample in their collection, which is time-consuming and inefficient. In this paper, we call this task as Musical Instrument Retrieval and propose a method for retrieving desired musical instruments using reference music mixture as a query. The proposed model consists of the Single-Instrument Encoder and the Multi-Instrument Encoder, both based on convolutional neural networks. The Single-Instrument Encoder is trained to classify the instruments used in single-track audio, and we take its penultimate layer's activation as the instrument embedding. The Multi-Instrument Encoder is trained to estimate multiple instrument embeddings using the instrument embeddings computed by the Single-Instrument Encoder as a set of target embeddings. For more generalized training and realistic evaluation, we also propose a new dataset called Nlakh. Experimental results showed that the Single-Instrument Encoder was able to learn the mapping from the audio signal of unseen instruments to the instrument embedding space and the Multi-Instrument Encoder was able to extract multiple embeddings from the mixture of music and retrieve the desired instruments successfully. The code used for the experiment and audio samples are available at: https://github.com/minju0821/musical_instrument_retrieval
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Music Instrument Classification Reprogrammed
Chen, Hsin-Hung, Lerch, Alexander
The performance of approaches to Music Instrument Classification, a popular task in Music Information Retrieval, is often impacted and limited by the lack of availability of annotated data for training. We propose to address this issue with "reprogramming," a technique that utilizes pre-trained deep and complex neural networks originally targeting a different task by modifying and mapping both the input and output of the pre-trained model. We demonstrate that reprogramming can effectively leverage the power of the representation learned for a different task and that the resulting reprogrammed system can perform on par or even outperform state-of-the-art systems at a fraction of training parameters. Our results, therefore, indicate that reprogramming is a promising technique potentially applicable to other tasks impeded by data scarcity.
- Research Report > Promising Solution (0.66)
- Research Report > New Finding (0.66)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
ChMusic: A Traditional Chinese Music Dataset for Evaluation of Instrument Recognition
Gong, Xia, Zhu, Yuxiang, Zhu, Haidi, Wei, Haoran
Musical instruments recognition is a widely used application for music information retrieval. As most of previous musical instruments recognition dataset focus on western musical instruments, it is difficult for researcher to study and evaluate the area of traditional Chinese musical instrument recognition. This paper propose a traditional Chinese music dataset for training model and performance evaluation, named ChMusic. This dataset is free and publicly available, 11 traditional Chinese musical instruments and 55 traditional Chinese music excerpts are recorded in this dataset. Then an evaluation standard is proposed based on ChMusic dataset. With this standard, researchers can compare their results following the same rule, and results from different researchers will become comparable.
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > Texas (0.04)
- North America > United States > Mississippi (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music
Kratimenos, Agelos, Avramidis, Kleanthis, Garoufis, Christos, Zlatintsi, Athanasia, Maragos, Petros
Instrument classification is one of the fields in Music Information Retrieval (MIR) that has attracted a lot of research interest. However, the majority of that is dealing with monophonic music, while efforts on polyphonic material mainly focus on predominant instrument recognition or multi-instrument recognition for entire tracks. We present an approach for instrument classification in polyphonic music using monophonic training data that involves mixing-augmentation methods. Specifically, we experiment with pitch and tempo-based synchronization, as well as mixes of tracks with similar music genres. Further, a custom CNN model is proposed, that uses the augmented training data efficiently and a plethora of suitable evaluation metrics are discussed as well. The tempo-sync and genre techniques stand out, achieving an 81% label ranking average precision accuracy, detecting up to 9 instruments in over 2300 testing tracks.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
- Europe > Greece (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Musical Instrument Recognition Using Their Distinctive Characteristics in Artificial Neural Networks
Toghiani-Rizi, Babak, Windmark, Marcus
In this study an Artificial Neural Network was trained to classify musical instruments, using audio samples transformed to the frequency domain. Different features of the sound, in both time and frequency domain, were analyzed and compared in relation to how much information that could be derived from that limited data. The study concluded that in comparison with the base experiment, that had an accuracy of 93.5%, using the attack only resulted in 80.2% and the initial 100 Hz in 64.2%.
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > New Finding (0.68)
- Instructional Material > Course Syllabus & Notes (0.46)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)